Skip to content

Conversation

@haoqing0110
Copy link
Member

@haoqing0110 haoqing0110 commented Nov 5, 2025

Summary

Related issue(s)

Fixes open-cluster-management-io/ocm#1237

Summary by CodeRabbit

  • Refactor

    • Reworked per-cluster timeout/soak handling and unified recheck scheduling so next-check timing is derived from per-cluster recheck times.
  • Tests

    • Updated and added tests covering recheck-time computation, mixed timeout/soak scenarios, and revised scheduling expectations across rollout strategies.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 November 5, 2025 08:03
@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Replaced TimeOutTime with RecheckTime in ClusterRolloutStatus, added calculateRecheckTime to compute next check times, and refactored recheck calculation so minRecheckAfter derives its result from per-cluster RecheckTime values. Tests updated accordingly.

Changes

Cohort / File(s) Summary
Core Rollout Logic
pkg/apis/cluster/v1alpha1/rollout.go
Replaced public field TimeOutTime *metav1.Time with RecheckTime *metav1.Time in ClusterRolloutStatus. Added calculateRecheckTime(startTime *metav1.Time, duration time.Duration) *metav1.Time. Refactored determineRolloutStatus() and related flows to set and use RecheckTime (soak end or timeout) and replaced uses of getTimeOutTime. Updated minRecheckAfter(rolloutClusters []ClusterRolloutStatus) *time.Duration to compute minimum from RecheckTime fields. Fixed comment typo.
Tests
pkg/apis/cluster/v1alpha1/rollout_test.go
Updated expected structs and test logic to use RecheckTime instead of TimeOutTime across rollout scenarios. Added TestMinRecheckAfter() to validate selection of the minimum RecheckTime (nil/empty/mixed cases). Adjusted time-based expectations to align with new recheck semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect determineRolloutStatus() and callers to ensure RecheckTime is set for the correct statuses (succeeded soak end vs. timeout/soak for others).
  • Verify calculateRecheckTime nil-handling matches previous semantics where no timeout/soak is intended.
  • Confirm minRecheckAfter correctly computes the minimum positive duration from RecheckTime and that all callers were updated appropriately.
  • Review updated tests, especially TestMinRecheckAfter(), for comprehensive coverage of edge cases (nil, empty, mixed).

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete. The Summary section is empty, and no specific issue number is provided in the 'Fixes #' field. Add a detailed summary of the changes and fill in the 'Fixes #' field with the issue number (1237) or update the description to match the template structure.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: setting RecheckTime when minSuccessTime is defined, which aligns with the refactoring to replace TimeOutTime with RecheckTime.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@haoqing0110
Copy link
Member Author

/assign @qiujian16
cc @youngbupark

@haoqing0110 haoqing0110 force-pushed the refactor-rollout-rechecktime branch from aab7c47 to efe5d8a Compare November 25, 2025 09:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/apis/cluster/v1alpha1/rollout_test.go (1)

2474-2548: Consider adding test case for past RecheckTimes.

The test coverage is good, but consider adding a test case where all RecheckTime values are in the past (before fakeTime). According to the implementation in rollout.go (line 614), only positive durations are considered, so this scenario should return nil. This would ensure the function correctly handles clusters that have already passed their recheck time.

Example test case to add:

{
    name: "All RecheckTimes in the past, should return nil",
    rolloutClusters: []ClusterRolloutStatus{
        {ClusterName: "cluster1", Status: Succeeded, RecheckTime: &fakeTime_60s},
        {ClusterName: "cluster2", Status: Progressing, RecheckTime: &fakeTime_30s},
    },
    expectedRecheckAfter: nil,
},
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aab7c47 and efe5d8a.

📒 Files selected for processing (2)
  • pkg/apis/cluster/v1alpha1/rollout.go (8 hunks)
  • pkg/apis/cluster/v1alpha1/rollout_test.go (26 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/apis/cluster/v1alpha1/rollout_test.go (1)
pkg/apis/cluster/v1alpha1/rollout.go (8)
  • Failed (33-33)
  • Progressing (29-29)
  • ToApply (27-27)
  • ClusterRolloutStatus (42-57)
  • TimeOut (36-36)
  • Succeeded (31-31)
  • RolloutResult (62-73)
  • RolloutClock (19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: verify
  • GitHub Check: integration
  • GitHub Check: unit
🔇 Additional comments (10)
pkg/apis/cluster/v1alpha1/rollout_test.go (4)

145-155: LGTM!

The test expectations correctly reflect the new RecheckTime semantics:

  • Progressing/Failed clusters have RecheckTime set to when their timeout will occur
  • TimeOut clusters have RecheckTime set to when they timed out
  • RecheckAfter is correctly derived as the minimum recheck time

504-563: LGTM!

The test correctly validates the interaction between minSuccessTime (soak period) and timeout behavior:

  • cluster1 is in its soak period with RecheckTime set to when soak ends
  • cluster3 is progressing with RecheckTime set to when timeout occurs
  • cluster2 has timed out with RecheckTime recording when the timeout occurred
  • RecheckAfter correctly derives the minimum recheck time (30s)

564-630: Excellent test coverage for mixed timeout and minSuccessTime scenarios.

This test case comprehensively validates the interaction between:

  • Clusters in their soak period (cluster1)
  • Clusters progressing with different remaining timeout durations (cluster2, cluster3)
  • Clusters that have completed their soak period (cluster4, correctly excluded)

The inline comments clearly explain each cluster's expected state and the RecheckAfter calculation is correct (minimum of 30s from cluster1 and cluster2).


1639-1697: Good coverage for progressivePerGroup with minSuccessTime.

This test validates the correct handling of soak periods in progressivePerGroup rollouts:

  • cluster1 in soak period is correctly included with RecheckTime
  • cluster2 outside soak period is correctly excluded
  • cluster3 progressing has timeout-based RecheckTime
  • RecheckAfter correctly derives the minimum (30s)

The inline comments effectively document the test expectations.

pkg/apis/cluster/v1alpha1/rollout.go (6)

53-56: LGTM!

The RecheckTime field documentation clearly describes its dual purpose for both soak period tracking (succeeded status) and timeout tracking (progressing/failed status), which aligns with the implementation.


500-515: LGTM!

The calculateRecheckTime helper function correctly handles all cases:

  • Returns nil when duration is maxTimeDuration (no timeout/soak period configured)
  • Uses current time when startTime is nil
  • Adds duration to startTime when provided

This is a clean, general-purpose replacement for the previous getTimeOutTime function.


461-498: LGTM!

The determineRolloutStatus function correctly implements RecheckTime tracking:

Succeeded status (lines 472-480):

  • Calculates when soak period ends using calculateRecheckTime
  • Sets RecheckTime when cluster is within soak period
  • Allows the cluster to be rechecked when soak completes

Progressing/Failed status (lines 486-494):

  • Calculates when timeout occurs
  • Sets RecheckTime before status determination
  • For timed-out clusters, RecheckTime records when timeout occurred
  • For progressing clusters, RecheckTime indicates when to check for timeout

The implementation correctly supports the dual semantics of RecheckTime for both soak periods and timeouts.


606-623: LGTM!

The refactored minRecheckAfter function is cleaner and more maintainable:

Key improvements:

  • Simplified signature by removing minSuccessTime parameter (information now encoded in RecheckTime)
  • Derives minimum recheck time directly from RecheckTime fields
  • Correctly filters out past recheck times (line 614: recheckDuration > 0)
  • Returns nil when no future recheck times exist

Semantics:

  • Works for both timeout tracking (Progressing/Failed) and soak period tracking (Succeeded)
  • Handles mixed scenarios correctly by finding the minimum across all cluster states

This aligns well with the per-cluster RecheckTime approach.


311-366: LGTM!

All call sites of minRecheckAfter in progressivePerCluster are correctly updated to pass only rolloutClusters, consistent with the new signature. The RecheckAfter field in the returned RolloutResult is properly populated in all code paths.


428-446: LGTM!

All call sites of minRecheckAfter in progressivePerGroup are correctly updated to pass only rolloutClusters, consistent with the new signature. The RecheckAfter field is properly populated in both return paths.

@qiujian16
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 26, 2025
@openshift-ci
Copy link

openshift-ci bot commented Nov 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haoqing0110, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit a185f88 into open-cluster-management-io:main Nov 26, 2025
13 checks passed
@haoqing0110 haoqing0110 deleted the refactor-rollout-rechecktime branch November 26, 2025 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhance the MWRS rollout gaps

2 participants